A Trie Based Set Similarity Query Algorithm

نویسندگان

چکیده

Set similarity query is a primitive for many applications, such as data integration, cleaning, and gene sequence alignment. Most of the existing algorithms are inverted index based, they usually filter unqualified sets one by do not have sufficient support duplicated sets, thus leading to low efficiency. To solve this problem, paper designs T-starTrie, an efficient trie based set query, which can naturally group with same prefix into node, all corresponding node at time, thereby significantly improving candidates generation In paper, we find that problem be transformed matching nodes first-layer (FMNodes) detecting on T-starTrie. Therefore, FLMNode detection algorithm designed. Based this, algorithm, TT-SSQ, implemented developing variety filtering techniques. Experimental results show TT-SSQ up 3.10x faster than algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Set Intersection Algorithm Via x-Fast Trie

This paper proposes a simple intersection algorithm for two sorted integer sequences . Our algorithm is designed based on x-fast trie since it provides efficient find and successor operators. We present that our algorithm outperforms skip list based algorithm when one of the sets to be intersected is relatively ‘dense’ while the other one is (relatively) ‘sparse’. Finally, we propose some possi...

متن کامل

Trie Based Subsumption and Improving the pi-Trie Algorithm

An algorithm that stores the prime implicates of a propositional logical formula in a trie was developed in [10]. In this paper, an improved version of that pi-trie algorithm is presented. It achieves its speedup primarily by significantly decreasing subsumption testing. Preliminary experiments indicate the new algorithm to be substantially faster and the trie based subsumption tests to be cons...

متن کامل

gSSJoin: a GPU-based Set Similarity Join Algorithm

Set similarity join is a core operation for text data integration, cleaning, and mining. Previous research work on improving the performance of set similarity joins mostly focused on sequential, CPU-based algorithms. Main optimizations of such algorithms exploit high threshold values and the underlying data characteristics to derive efficient filters. In this paper, we investigate strategies to...

متن کامل

Similarity-Based Query Caching

With the success of the semantic web infrastructures for storing and querying RDF data are gaining importance. A couple of systems are available now that provide basic database functionality for RDF data. Compared to modern database systems, RDF storage technology still lacks sophisticated optimization methods for query processing. Current work in this direction is mainly focussed on index stru...

متن کامل

Trie-Join: Efficient Trie-based String Similarity Joins with Edit-Distance Constraints

A string similarity join finds similar pairs between two collections of strings. It is an essential operation in many applications, such as data integration and cleaning, and has attracted significant attention recently. In this paper, we study string similarity joins with edit-distance constraints. Existing methods usually employ a filter-and-refine framework and have the following disadvantag...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Mathematics

سال: 2023

ISSN: ['2227-7390']

DOI: https://doi.org/10.3390/math11010229